Class Imbalance Problem in Data Mining Review

نویسندگان

  • Rushi Longadge
  • Snehalata Dongre
چکیده

In last few years there are major changes and evolution has been done on classification of data. As the application area of technology is increases the size of data also increases. Classification of data becomes difficult because of unbounded size and imbalance nature of data. Class imbalance problem become greatest issue in data mining. Imbalance problem occur where one of the two classes having more sample than other classes. The most of algorithm are more focusing on classification of major sample while ignoring or misclassifying minority sample. The minority samples are those that rarely occur but very important. There are different methods available for classification of imbalance data set which is divided into three main categories, the algorithmic approach, datapreprocessing approach and feature selection approach. Each of this technique has their own advantages and disadvantages. In this paper systematic study of each approach is define which gives the right direction for research in class imbalance problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem

Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue...

متن کامل

A Review of Class Imbalance Problem

Class imbalance is one of the challenges of machine learning and data mining fields. Imbalance data sets degrades the performance of data mining and machine learning techniques as the overall accuracy and decision making be biased to the majority class, which lead to misclassifying the minority class samples or furthermore treated them as noise. This paper proposes a general survey for class im...

متن کامل

A Review on Imbalanced Learning Methods

Nowadays learning from imbalanced data sets are a relatively a very critical task for many data mining applications such as fraud detection, anomaly detection, medical diagnosis, information retrieval systems. The imbalanced learning problem is nothing but unequal distribution of data between the classes where one class contains more and more samples while another contains very little. Because ...

متن کامل

Class Imbalance Problem in Data Mining using Probabilistic Approach

Class imbalance problem are raised when one class having maximum number of examples than other classes. The classical classifiers of balance datasets cannot deal with the class imbalance problem because they pay more attention to the majority class. The main drawback associated with it majority class is loss of important information. The Class imbalance problem is a difficult due to the amount ...

متن کامل

Alleviating the Class Imbalance problem in Data Mining

The class imbalance problem in two-class data sets is one of the most important problems. When examples of one class in a training data set vastly outnumber examples of the other class, standard machine learning algorithms tend to be overwhelmed by the majority class and ignore the minority class. There are several algorithms to alleviate the problem of class imbalance in literature. In this pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1305.1707  شماره 

صفحات  -

تاریخ انتشار 2013